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Abstract 

In this work we describe both a type checking and a type infer¬ 
ence algorithm for generic programming using the spine view of 
data. The spine view of data is an approach to decomposing data in 
functional programming languages that supports generic program¬ 
ming in the style of Scrap Your Boilerplate and Stratego. The spine 
view of data has previously been described as a library in a stat¬ 
ically typed language (as in Haskell), as a language feature in a 
dynamically typed language (as in Stratego), and as a calculus of 
patterns (as in the Pattern Calculus). The contribution of this paper 
is a type inference algorithm for the spine view and a type rela¬ 
tion that underlies this inference algorithm. In contrast to all other 
typed implementations of the spine view, the type inference algo¬ 
rithm does not require any type annotations to be added in support 
of the spine view. This type inference algorithm is an extension of 
Hindley-Milner type inference, thus showing how to introduce the 
spine view of data as a language feature in any functional program¬ 
ming language based on Hindley-Milner. 

Categories and Subject Descriptors D.3.3 [Programming Lan¬ 
guages ]: Language Constructs and Features—patterns, data types 
and structures; F.3.3 [Logics and Meanings of Programs]: Studies 
of Program Constructs—functional constructs 

Keywords spine view; pattern matching; generic programming; 
FCP; type inference 

1. Introduction 

Generic programming is desirable because it reduces the amount 
of boilerplate code required to encode certain programs. Interest in 
generic programming has resulted in many implementations with 
many subtly different characteristics. In Haskell there are multi¬ 
ple libraries [2, 9, 14-16], Stratego [19] is a language built to sup¬ 
port generic programming via strategic programming. The bondi 
programming language [7] supports generic programming via ad¬ 
vanced pattern matching. Clean has a system of generic program¬ 
ming [1] as an extension to the language that views data as a sum 
of products. Although there are many systems of generic program¬ 
ming in many languages, there are none that define a simple inter¬ 
face to the spine view of data [3, 5] and on which we can directly 
perform type inference. We provide both of these. 
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Defining a type inference algorithm for a language feature is 
desirable because it means adding that feature does not force any 
further type annotations. In languages like Haskell where type 
inference is part of the language, this is clearly very important. 
However, in any language it is useful to know that adding a spine 
view will not necessitate any further notational overhead. Although 
the type annotation overhead of generic programming has proven 
small, this is due to the very complex type systems in which it is 
typically deployed. In this work we are describing a very small 
extension to Hindley-Milner. 

A system of generic programming requires three ingredients 
[4], a generic view of data, a system for functions that dispatch 
on a type argument (which we will refer to as “type indexed func¬ 
tions”), and a run-time representation. All are necessary for generic 
programming, but they are orthogonal; one can be considered in 
isolation of the others. 

In this work we are concerned only with the generic view 
of data. There are multiple possible approaches to type indexed 
functions and run-time representation, and any of them could be 
combined with the work we present here. In fact, in other work, we 
have combined the inference algorithm in this paper with a system 
for type indexed functions and a run-time representation, resulting 
in a full generic programming language called dgen [18]. All of 
the work presented in this paper has been implemented in the dgen 
compiler. 

Generic programs are primarily of interest in functional pro¬ 
gramming languages, where the (generalised) algebraic data type 
view of data normally requires lots of boilerplate for working with 
large data types. Furthermore, it is in statically typed functional lan¬ 
guages where the problem is most acute because dynamically typed 
languages often allow run-time solutions to the problem. While 
generic programming is not restricted to statically typed functional 
languages, it is in this domain that we will consider it. 

For languages based on Hindley-Milner type inference there 
are two functional language features that are common and well- 
understood, but not standard, that must be available for generic pro¬ 
gramming via the spine view to work. We have previously shown 
that to allow generic programming with the spine view requires 
both polymorphic recursion and higher ranked types [18]. These 
features are not difficult to achieve in general, but for type infer- 
encing systems they are not available without either type annota¬ 
tions [13, 17] or witnessing constructors [10]. We will show how 
to add generic programming to a type inferencing functional lan¬ 
guage that uses witnessing constructors for polymorphic recursion 
and higher-ranked types. 

In Section 1.1 we formally describe the spine view of data. 
In Section 2 we describe FCP q , our language which supports the 
spine view of data. Our description includes a formal operational 
semantics which is used in Section 3 to describe a sound type 
relation for this language. This type relation is simple and novel. 
The main contribution of this work is a type inference algorithm for 
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Figure 1. The fully applied constructor view of the Tree value 
Node (Node Leaf 4 Leaf) 2 (Node Leaf 3 Leaf). 


FCP q given in Section 4. Section 5 describes related work, Section 
6 gives future directions, and Section 7 concludes the paper. 

1.1 The Spine View of Data 

We are going to show in this paper how to perform type inference 
for a particular generic view of data included in a functional lan¬ 
guage. The view of data we are concerned with is the spine view 
of data. The spine view is the mechanism that underlies the “Scrap 
your Boilerplate” series of papers [14—16], the pattern calculus [7], 
and that has been described explicitly by Hinze et.al. [3, 5]. In 
this section we formally describe the spine view, comparing it to 
the way we normally consider data in functional programming lan¬ 
guages. 

In functional programming languages we can think of a data 
constructor as a function that takes its arguments and returns a value 
of its data type. If you give that function only some of its arguments, 
it remains a function. However, if you give it all its arguments, you 
get a data value which can be pattern matched against. 

For example, in Haskell, a data type definition 

data List a = Cons a (List a) I Nil 
introduces two constructors; 

Nil :: List a 


Cons :: a -> List a -> List a 

which can be matched against in case statements and function 
definitions, 

length Nil = 0 

length (Cons x xs) = 1 + length xs 

It is not possible under this view of data to pattern match 
against a partially applied constructor such as Cons 1. We call 
this the “fully applied constructor view” of data. Although some 
languages which use the fully applied constructor view will accept 
a term like Cons 1, that term is not a data value, it is a function 
which is waiting for its last argument; i.e. \xs -> Cons 1 xs in 
Haskell. According to this view, we can think of data as a tree with 
the constructor at the root and its arguments as its children. The 
number of children depends on the number of arguments for that 
constructor. Consider the following Haskell data type definition: 

data Tree a = Node (Tree a) a (Tree a) I Leaf 

Tree is a binary tree which stores data in the nodes. To traverse 
the data structure we need to know which constructor we are at and 
thus how many children it has. This style of traversal is achieved 
with standard pattern matching case expressions where one alter¬ 
native is given for each possible constructor. Under this view an 
individual function is restricted then to operating over one (perhaps 
parameterised) type because we must enumerate its constructors. 

In contrast, the spine view of data considers a constructor with 
only some of its arguments to be a data value which can be matched 
against. Each argument applied to the constructor is reified in the 
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Figure 2. The spine view of the Tree value Node (Node Leaf 4 
Leaf) 2 (Node Leaf 3 Leaf). 


structure as a “data application node”. These data applications are 
just like function applications, the types work out the same, but run¬ 
time support for them is required. This approach to data is called the 
“spine view” because drawing the value with its application nodes 
creates a “spine” of applications off which the values applied to the 
constructor hang. This is a view of data where all data is a binary 
tree of data application nodes. 

Figures 1 and 2 show an example data value under the two dif¬ 
ferent views. We assume integers are an infinite number of nullary 
constructors. Under the spine view every non-leaf node has two 
children, regardless of how many children the constructor is de¬ 
fined to have. This is the central feature of the spine view, internal 
nodes are tuples (or pairs) and leaf nodes are nullary constructors. 
This means that two cases are enough to pattern match on any data 
using the spine view. The spine view presented here is suited to 
generic consumers, not generic producers and other work addresses 
how to extend it to that domain [3], 

In this work we expose the spine view of data with a single 
new primitive expression, the is-pair expression, ispair d bind 
( x , y) in e else / will scrutinise the value d and if it is an internal 
node by the spine view (o in Figure 2), will bind x to the left child 
and y to the right child, evaluating the in branch. If the scrutinesed 
value is a leaf node by the spine view (Node, Leaf, 2, 3, or 4 in 
Figure 2) no new binding is made and the result is the else branch. 
In the remainder of this paper we describe a simple language that 
supports the spine view via an is-pair expression, and show a type 
relation and a type inference algorithm for this expression. 

The simplest thing you can do with the spine view of data is to 
pull an arbitrary value apart and put it back together again 

Ad.ispair d bind ( x , y) in x y else d 

This function takes in any value and scruitinises it to see if it is a 
constructor applied to something. If it is, x is bound to the construc¬ 
tor and all its arguments but the last, y is bound to the last argument 
and the in branch is the result. For example, for the value Node 
(Node Leaf 4 Leaf) 2 (Node Leaf 3 Leaf) (shown in Fig¬ 
ure 2), x is bound to Node (Node Leaf 4 Leaf) 2 and y is 
bound to Node Leaf 4 Leaf. In the in branch, the constructor 
with a missing argument is applied to that argument, which re¬ 
attaches the final argument to its constructor. The end result is that 
the initial value is returned. If the scrutinee is not a constructor ap¬ 
plied to something, then it must be a constructor by itself and it 
is returned as is. Thus this function should have the type a —r a. 
Our type inference algorithm is able to infer this type with no type 
annotations. 

This simple example can be extended to explore some subtleties 
of the spine view. If we were to try and write a function which took 
in any data (constructor applied to arguments) and stripped its final 
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argument, it might look something like 

Ad.ispair d bind (a;, y) in x else ?? 

We would hope very much that such a function was severely 
restricted and indeed attempting to fill the else branch makes it 
clear that there is no general value we can put there which would 
work for all types. We can write a version where the else branch 
results in something more specific (assuming a constructor K of 
the type Tk with type K : Int —I Int —>■ T K ) 

Ad.ispair d bind ( x , y) in x else (K 3) 

Now we can assign a type to the function, but it can’t be a 
polymorphic type. An appropriate type would be T K —> Int —> 
Tk- Here we see that under the spine view of data, constructors 
which are missing their arguments can exist. The type inference 
algorithm we present here will compute this type. 

Of course, we must be very careful how we use constructors 
which have been stripped of some of their arguments, we must 
ensure they are only ever applied to values of the right type. Again, 
the type system we present here enforces this, allowing for example 
terms such as 

(Ad.ispair d bind (x , y) in x else (K 3)) (K 2 3) 4 
and computing the type T K for it. 

1.2 The Missing Inference Algorithm 

Amongst the accounts of generic programming using the spine 
view there is no system which performs type inference without 
annotation on terms. In GHC/Scrap Your Boilerplate for example, 
inference is done in the presence of type annotations on terms. 
The definition of the underlying combinator gfoldl includes the 
following type annotation: 

gfoldl :: (forall d b. Data d => 
c (d -> b) -> d -> c b) 

-> (forall g. g -> c g) 



The terms which encode the spine view, such as gfoldl, re¬ 
quire type annotations and the GHC type inference algorithm is 
particularly sophisticated. It is not clear what parts of it need to be 
replicated for typing the spine view in particular. 

In the bondi programming language an existentially quantified 
variable that needs to be accounted for forces the use of “aggressive 
assumptions” in the inference algorithm and expressions using the 
spine view are also annotated with their types. 

Thus two questions arise, “is it even possible to do type infer¬ 
ence on the spine view without annotation of terms?” and “exactly 
what additional machinery is required in a type system to support 
inference for the spine view?”. It is these two questions which mo¬ 
tivated the research we present here. 

1.3 The Key Ideas of this Paper 

1.3.1 Generic code can be expressed in a very small calculus 

Generic programming is typically demonstrated in quite complex 
systems. This is partly because three non-trivial problems need to 
be solved to achieve convincing demonstrations of the idea (see 
section 2.1). In this paper we build on the work already done 
by many people who have shown the kinds of functions we can 
write in generic programming and we distill it down to a very 
small calculus. This makes it feasible to prove, rather than just 
demonstrate, properties of the system, such as the soundness of the 
type relation. 


1.3.2 The spine view can be encoded with a single expression 

There have been a number of expositions of the spine view (see 
section 5) but they are all wrapped up in much larger systems. The 
most notable examples being those generic programming libraries 
for Haskell which rely on GHC extensions, and the pattern calculus 
which is a full account of treating functions as data. In this work 
we extract just what is needed to support the spine view of data. 
We describe a single extension to the lambda calculus (and systems 
based thereon), the is-pair expression, which is sufficient to support 
the spine view. 

1.3.3 Encoding the spine view with an is-pair expression 
provides a place to introduce the necessary existentially 
quantified variable 

The spine view of data generates an existentially quantified type 
variable (see Section 3). In other typed accounts of the spine view, 
the treatment of this existentially quantified variable is unsuitable 
for type inference. In this work we show that by using an is-pair 
expression to encode the spine view, we create an artefact in the 
language at exactly the point where the existentially quantified 
variable needs to be introduced. Further, one branch of the is-pair 
expression is the only place that existentially quantified variable 
exists, so the scope created by the is-pair expression also matches 
with the scope of the existentially quantified type variable. It is this 
fact that makes type inference possible. 

1.4 The Key Outcomes of this Work 

1.4.1 Correction of an error in the FCP type inference 
algorithm 

Using FCP [10] in the way we do (see Section 1.5) in this work 
highlights an error in the original formulation of FCP’s type in¬ 
ference algorithm. We have corrected this error and describe the 
correction in this paper. 

1.4.2 A type inference algorithm for the spine view of data 

We describe a type inference algorithm for the spine view of data 
which requires only Hindley-Milner types and is a modest exten¬ 
sion to an existing system, FCP. 

1.4.3 A correctness proof for the inference algorithm 

The correctness of the work in this paper is shown in two ways. 
The first is a working implementation in the DGEN compiler. DGEN 
includes an extensive set of example programs demonstrating that 
this particular spine view of data can encode a large number of 
generic programs and types for them can be inferred. 

Most importantly, the correctness of the work in this paper 
is demonstrated with proofs of key properties of the systems de¬ 
scribed. We have proofs that: 

• The type relation is sound. If you can assign a type to a term 
in FCP that term can be evaluated one step to a term with the 
same type or is a value which has that type. 

• The type inference algorithm computes types for terms which 
are consistent with the type relation. If a term is given a type a 
by the type inference algorithm, then the type relation holds for 
that term having the type a and vice versa. 

1.5 FCP 

FCP is a language and associated type system that supports first 
class polymorphism. FCP uses Hindley-Milner types (i.e. universal 
quantifiers only occur at the top-level) and no type annotations are 
required on terms. FCP adds to the lambda calculus constructors in 
the style of algebraic data types. These constructors operate exactly 
as they do in functional languages like Caml and Haskell. Each 
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constructor is effectively a constructor function which when given 
all its arguments, will construct a value of the type in question. 
In FCP, as in Caml and Haskell, constructors are annotated with 
a type. What FCP adds to other systems with data constructors is 
that the type relation and type inference rules are able to work with 
nested quantifiers in these constructor types. 

For terms, FCP allows only Hindley-Milner types and there are 
no annotations. For constructors, nested quantifiers are supported. 
This small change is enough to support all of System F. Jones 
[10] describes a type-driven algorithm for converting any System 
F program to an FCP program and vice versa. Thus it is possible 
to use the first-class polymorphism of FCP to support other type 
system features not available in Hindley-Milner, such as higher- 
ranked functions and polymorphic recursion. 

FCP differs from other systems which have been used to support 
higher-ranks, polymorphic recursion and first class polymorphism 
because it does not have type annotations on terms and does not ex¬ 
tend the Hindley-Milner types. Alternatives which have seen wider 
usage such as Peyton Jones et al.’s practical type inference [17] or 
Remy and Le Botlan’s MLF [11], are simpler to program with, but 
require type annotations on terms. It is our primary concern to show 
that the spine view of data, and in particular the is-pair expression, 
does not require any annotations on terms and thus working from a 
system which included them would hamper our argument. 

The Peyton Jones et. al, Remy and DeBotlan systems also use 
significantly different types to those of Hindley-Milner. One of our 
aims in this work is to show a “baseline” for the spine view of data 
in functional languages. We want our work to be applicable to as 
many different existing (or future) programming languages/com¬ 
pilers as possible. Thus the fact that FCP only uses Hindley-Milner 
types further recommends it as a starting point for this work. 


2. A Language of the Spine View 

We now describe an extended lambda calculus FCP q that supports 
the spine view of data. We take as our starting point FCP [10], a 
language that includes the fully applied constructor view of data 
and that supports inference for first class polymorphism. 

Figure 3 shows the syntax for types and terms of FCP q (pro¬ 
nounced FCP-spine). FCP q is FCP with additions for the spine 
view of data. Shared with FCP, FCP q has: monotypes for con¬ 
structed values and function types; type variables; type schemes 
that are quantified only at the top level; lambda abstractions for 
parameterising a function by a variable; function applications for 
providing arguments to functions; variables; let expressions which 
allow polymorphic definitions; pattern matching lambda abstrac¬ 
tions; constructed values built from a constructor token and its ar¬ 
gument values; and a “pattern matching lambda” expression for 
pulling a constructor from a constructed value. 

In addition to the above features taken from FCP, FCP q has the 
following new expressions to support the spine view of data: 

Recursive let expressions Generic programming relies on recur¬ 
sive definitions and in particular needs polymorphic recursion. 
Thus FCP q has a recursive let expression where FCP had a non¬ 
recursive let expression. 

The is-pair expression The is-pair expression is the expression 
that allows us to pattern match against the spine view of data. 
Its left branch is used if its discriminator is a tuple/pair and its 
right branch is used for nullary constructors. 

Constructors take more than one argument FCP restricts con¬ 
structors to arity 1. While this does not invalidate what we are 
showing here, it rather disguises it because the whole point of 
the spine view is to say that “no matter what the arity of your 
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Figure 3. Type and expression syntax of FCP q 


constructor, we can see it as arity 2 using the spine view”. Thus 
FCP q has constructors which take multiple arguments. 

FCP q achieves inference for higher ranked types and polymor¬ 
phic recursion in the same way as FCP, by requiring them to be 
witnessed by constructors that wrap the necessary type. Types are 
wrapped with constructors and unwrapped by functions that take 
the constructed value and return its contents. Jones shows that this 
is enough to encode anything you can encode in System F while 
maintaining type inference [10]. The result is that FCP expressions 
are somewhat more complex than the equivalent in, for example, 
GHC’s type system, but no type annotations are required. 

A subset of FCP q expressions are designated as values (v). 
These are abstractions, decompositions, and constructors tagging 
values. 

2.1 Example generic expressions in FCP q 
We now give a number of examples of the generic functions you 
can write in FCP q . Type inference for FCP (and thus FCP q ) oc¬ 
curs in an environment in which various datatypes with associated 
constructors have been defined. These can be created by datatype 
definitions that introduce a constructor with a function type from 
its argument type to the type of the datatype being defined. For ex¬ 
ample, the Haskell code 

data Tagged a = Tag a 

creates the type Tagged and the constructor Tag with the type 
signature: 

Tag : a ^ Tagged a 

We can also define a function to pull the data from a Tagged value: 
Tag -1 = A (Tag d).d 

For our examples we need witnessing types for a type Ma.a —¥ a 
and a polymorphic recursive mapping type V/3.(Va.a —> a) —>• 
P —► p. Thus all the examples are written assuming the follow¬ 
ing constructor (K b , A' ( ,) and extractor function (K^ 1 , K~ x ) def¬ 
initions. The constructed types Y t and Y), are used to witness the 
identity type and the polymorphic recursive mapping type respec- 
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K l : (Vo.o -4 a) -4 Y t 
k„ : (V 6 .y t -4 b -4 6 ) -> y M 
iff 1 = A(/sT t *).* 

/ST" 1 = \(K„ x).x 

Witnessing higher-ranked types in this way is a straightforward 
task, but it tends to obfuscate the meaning of our expressions. All 
definitions of higher ranked and polymorphic recursive functions 
need to be put into the appropriate constructor and all uses of 
these functions need to extracted from those constructors. In other 
work [18] we have shown how this system can be converted to one 
that uses type annotations, making the expressions much easier to 
understand. 

Figure 4 defines an FCP q expression that applies a polymorphic 
argument to all the values in the data it is passed. In bondi this ex¬ 
pression (ap) is called apply to all, in Stratego it is all bottom up} 
This function takes a polymorphic function (which would normally 
be type-indexed) and applies it to every sub-value in its second ar¬ 
gument. It does this by calling the function on the whole value and 
recursively applying itself to each part of the tuple under the spine 
view. The recursive expression ap is polymorphically recursive, it 
is applied recursively to two arguments with two different types 
in its own body. The first argument to ap is applied to (possibly) 
three different types in the body of the expression and thus must be 
witnessed by a K L . This is a concrete example of why we need to 
begin with a system that supports type inference for polymorphic 
recursion and higher-ranked functions. 

We can define a top-down version (a r ) of this expression using 
the spine view, 

letrec a T = K M ( Xf.Xd. 

ispair (( K t -1 /) d) bind (x,y) 
in ((K~ 1 a T )fx) UK- 1 or) fy) 
else {R- 1 f) d) 

We can also write expressions that collect a single value from 
a data structure, called generic queries. Figure 5 shows an expres¬ 
sion, q T , which queries from the top-down. q T takes as arguments a 
polymorphic accumulator function, a start value and data to query. 
It recursively calls itself on the left-hand-side of the tuple using the 
query value from a recursive call of itself on the right-hand-side of 
the tuple as a starting value. The start value of this second recursive 
call is generated by calling the accumulator function on the whole 
value with the original start value. Again, we require witnessing 
constructors and extractors for the some of the types: 

K k : (Vo.r -4 a -4 r) -» Y« r 
K%: Y K r —y fe t —yYfb r 
K K %iJi‘X(K K x ).x 
K T = A {Ki x).x 

Again we can control the order of the traversal using the spine 
view, the following definition is a bottom up query ( qp ): 
letrec qp = R% ( Xf.Xs.Xd. 

ispair d bind ( x , y) 

in (R- 1 f) ((R^ 1 qp) f ({R^ qp) f s y) x) d 
else (R- 1 f) s d) 

1 Stratego only behaves the same as FCP q in this instance if the strategy 
being passed to all bottom up can’t fail. 


let gfoldl =Xk.Xz.Xp. 

ispair p bind ( x , y) 
in k (gfoldl k z x) y 
else z x 

Figure 6. SYB’s gfoldl in FCP, 


Terms like a T , ap, q T and qp can’t be written in FCP, but they 
can be written in FCP q . Since we know type inference is possible 
for FCP, the question we are answering in this paper is “can we 
define type inference for FCP q l” If we can’t then it is the extension 
of FCP, i.e. the spine view, which precludes type inference. In fact 
we show that the spine view does not preclude type inference. 

Also, we know that we can define terms like a T ,ap, q T and qp in 
type systems with type annotations and which do type checking/in¬ 
ference in both directions, as in algorithm M [12]. We know this 
because of all the GHC-based implementations of these functions. 
What we are answering in this paper is “can we perform type in¬ 
ference, with no annotation of terms, for these terms”. The answer, 
happily, is yes. 

We answer these questions by defining, and proving correct, a 
type relation and a type inference algorithm on FCP q . 

2.2 Comparison to Scrap Your Boilerplate 

FCP q is related to Scrap Your Boilerplate (SYB) [14] quite closely. 
We now compare the two and use SYB primitives as a further 
example of FCP q . In this section we don’t include the witnessing 
constructors and destructors, making the examples simpler to read. 

Firstly, both SYB and FCP q use the spine view of data to 
see arbitrary data in a uniform way. SYB exposes the spine view 
via two non-recursive combinators, gmapC) and gmapT. These non¬ 
recursive combinators are turned into recursive traversal and query 
functions by the library functions everywhere and everything. 
All of these, however, are built from one underlying combinator; 
gfoldl. The ispair operation in FCP q is an alternative to gfoldl 
but as we will see, it is more suitable as an extension for an 
underlying calculus because it is more flexible than gfoldl. The 
ispair expression is the minimal addition required to a core calculus 
to achieve what gfoldl achieves in SYB but it is also capable of 
directly encoding the other SYB combinators gmapQ, gmapT. To 
see this, we will encode all three in FCP q . 

Figure 6 shows how FCP q can encode gfoldl. Notice that 
one definition of gfoldl will work for all values of all datatypes. 
This contrasts with SYB where an instance of this function must 
be created, either by the programmer or by the compiler, for each 
datatype. This then enforces the need for type classes in the SYB 
approach. FCP q is both simpler and requires less compiler machin¬ 
ery to do the same thing. 

From here we could build gmapT, gmapQ from gfoldl, but 
FCP q allows us an alternative. Figure 7 shows direct encodings of 
these functions in FCP q (assuming a list datatype with constructors 
C and N and a concatenation function concat). 

Directly defining pattern matching against the spine view opens 
up the opportunity to mix spine view patterns with fully applied 
constructor pattens, as shown by Jay [7], 

SYB also includes a system for type indexed functions with 
the ext combinators. Type indexed functions allow for specific 
behaviour at certain nodes in the value being operated on. The 
methods for defining such functions can be combined with this 
work, as we have shown in the dgen compiler where an extension 
mechanism very similar to the one used in SYB has been combined 
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letrec ap = K M ( Xf.Xd. 

ispair d bind (a;, y) 

in (K r 1 /) (( K - 1 a/ 3 ) / at) ((if - 1 «/ 3 ) / y) 
else (/f,:' /) d) 


letrec : V6.(Va.o —*■ a) —i b —>• 6 = 

A/: Va.a -► a.Ad. 


ispair d bind (x, 3/) 
in / ((a/3 / a:) («/3 / 2 /)) 
else / d 


Figure 4. A generic function similar to Stratego’s all bottom up. On the left is the expression in FCP q , on the right is the same function 
written using type annotations in the style of GHC instead of witnessing constructors. The type annotation version is given as a guide to 
understanding the FCP q version. 


letrec q T = K { ( Xf.Xs.Xd. 

ispair d bind (x, y) in (K^q T ) / ((iff V) / ((iff 1 /) s d) y) x 
else (iff 1 /) s d) 


letrec q T : Vr.V6.(Va.r -ya^r)^r^b^> ?' ■■.==•• 

A/: Va.r -» a -» r.Xs.Xd. 

ispair d bind (x, y) in q T / (?r f {f s d) y) x 


Figure 5. A generic query. On the left is the expression in FCP q , on the right is the same expression written using type annotations instead 
of witnessing constructors. 


let gmapT = Xf.Xd. 

ispair d bind (x, y) 
in (gmapT / x) (/ i /)) 

let gmapQ = Xf.Xd. 

ispair d bind (x, y) 
in concat(gmapQ / x,C(f y, N)) 
else N 


Figure 7. Direct encodings of SYB combinators in FCP q 


with FCP q and a run-time representation to create a full generic 
programming language. 

2.3 Operational Semantics of FCP q 

Figure 8 provides an operational semantics for FCP q that captures 
the normal lambda calculus evaluation style, the expected operation 
of pattern matching lambdas, and the spine view of data via ispair 
. The evaluation rules make use of a substitution operation in which 
[x/e]f denotes the substitution of e for free occurrences of x in /. 

The evaluation rules for application, lambda abstraction, and the 
E-Conl rule are shared with FCP. The operation of the recursive 
let is specific to FCP q but is entirely standard. The four evaluation 
rules added to support the spine view are: 

E-Con2 If a constructed value is applied to an expression, the con¬ 
structed value is treated like a function expecting an argument 
and it consumes the expression. The result is that the construc¬ 
tor is given the expression as a new final argument. This oper¬ 
ation can only occur on constructors that have previously been 
stripped of one of their arguments and this constraint is enforced 
by the type system. This is how we “put data back together” ac¬ 
cording to the spine view. 

E-IsPairl If the discriminator of an is-pair expression is not a 
value, evaluate it one step. 

E-IsPair2 If the discriminator of an is-pair is a value and it is 
a constructor with at least one argument attached, bind the 
constructor and all arguments but the last to x, bind the last 


argument to y and evaluate the left branch. This is how we “pull 
data apart” according to the spine view. 

E-IsPair3 If the discriminator of an is-pair is a value and it is not 
a constructor with at least one argument attached, evaluate the 
right branch. 

3. Typing the Spine View 

Figure 9 describes the typing relation of FCP q . A b e: r denotes 
the expression e having the type r given the environment A. An 
environment is a set of variable to type scheme bindings. A x , x: o 
is the environment A where any existing binding for x has been re¬ 
placed by <7. The variable, application, let expression, and lambda 
abstraction rules are based on the equivalent rules in FCP but we 
use generalisation in specific places rather than have a generalisa¬ 
tion rule. Being specific about generalisation simplifies the proofs 
of type system and type inference properties. A type scheme a is a 
generalisation of a type r, denoted o y t, if there is some substi¬ 
tution for the bound variables of a that gives r. We can generalise 
a type r to a type scheme with the Gen operation 

Gen{r, A) = Vai... a n .r where {cti ...a„} = TV{t)\TV{A) 

While we need a new evaluation rule to deal with putting spine data 
back together, we don’t need a special type rule for this because the 
normal function application rule T-App works perfectly. 

The fact that function application and data application have the 
same type rules is a key property of Jay’s pattern calculus [7]. Our 
work has validated Jay’s findings. 

Specific to FCP q is the T-IsPair rule. How can we determine the 
type rule for ispair? Our starting point is the following question 
Given d has type D, and assuming d is a tuple, what are the 
types of x and y in ispair d bind ( x , j/)in / else gl 

If d is a tuple, it is a constructor applied to some arguments, x is 
the constructor with all the arguments but the last, so it could have 
the type of a function that if given that last argument, will return 
a value of the original type, i.e. argtype —» D. y is exactly the 
argument that was peeled off, so its type is argtype. However, we 
don’t know anything about argtype. Since we only know d has type 
D, we can say nothing about its last argument, we only know that 
there is one. We might be tempted then to make it a type variable, 
giving x the type a —> D and y the type a. However, this variable 
would be implicitly universally quantified and since x and y will 
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E-App2- 




ef^e'f 


letrec x = e in / —> letrec x — e! in / 
x e FV(v f ) 


{Xx.e) v —>■ [x/v\e 
_/— 


letrec x = v 
E-PLam- 


A letrec a: = v in [x/p]t/ 


letrec a; = 
E-LetRec4 — 


n / —>■ letrec x = v in /' 
x $ FV(vf) 


(A (K xi,..., x n ).e) K(vi, [ati/ui] • • • [a: n /u n ]e 

E-Con2 — 


i,e J+ i,...,e m ) —»■ ^(pi,...,p,_i,e',e J+ i,...,e m ) K(vi,.. .,v m ) e—> K(y u 

sPairl- e * e - - - 

ispair e bind (a;, y) in / else gr —> ispair e bind (x, y ) in / else g 

_ m > 1 _ 

ispair (K(vi ,..., v m )) bind (a;, y) in / else g —A [x/K(vi,..., v m -i), y/v m \f 

E-IsPair3-7777-7----- 

ispair K{) bind ( x, y) in / else g —> g 

Figure 8. Operational Semantics (e —> f ) of FCP q 


A hi 
T-App — 
A x ,x: a 


A b Xx.e 
Ahf:p 


A b letrec x = e in /: r 
for each K : ((Vai.r(), • • •, (Va„.r^)) ->• t k 
where tk is unique to this K 
Wi.(A Vei-.r'i) ¥*(«, ^ TV (A)) 


Ah K( ei ,.. 
V* § 1 ... m.(A b a : r/) 


A b X(ei, ■ ■ -, e m ): r ' m+1 ^ (• • • -+ « -t nr) ■ ■ ■ ) 
A xl ... Xn ,xi: [ai/pi]r(, ... ,x n - [ a n /p n ]Tn b e: r e 
A b X(K(xi,... ,x n )).e: tk -t r e 
AbeiTe A Xt y,x: a ^ Te,y: a\~ f: t 
Ahg-.r a$TV{A,T,r e ) 

A b ispair e bind (a:, y) in / else gr: r 


Figure 9. Type Relation (Abe 


the type system in Figure 9 is sound. The proof is based on the 
small-step semantics in Figure 8 and proves both progress (if A b 
e: r then e —> e' or e is a value) and preservation (If A b e: t 
and e —> e! then A b M : r). 

3.1 Details of the Proof 

We have followed the syntactic approach of Wright and Felleisen 
[20] in constructing our proofs thus the proofs of type system 
soundness for FCP q are almost a superset of the equivalent proofs 
for FCP. For this reason we do not describe the proofs in detail, in¬ 
stead focussing on the is-pair expression and the treatment of par¬ 
tially applied constructors. A version of the full proof (including the 
FCP segments) has previously been published in Robert’s doctoral 
thesis [18]. 

Our proof of type system soundness hinges on two main results: 
Progress If A b e: r then e —> e! or e is a value. 

Preservation If A b e: r and e —> e! then Abe':r 

If both progress and preservation are proven then the type system 
is sound. 

Theorem 3.1 (Progress). If A b e: r then e — ft. e' or e is a value. 

Proof. The proof proceeds by induction on the length of the type 
deduction for A b e: r, with one case for each possible final 
deduction rule. Here we only give the cases relating to the spine 
view of data. 


appear separately, we lose the fact that the two a variables represent 
the same something. The solution to this problem is to existentially 
quantify this type variable. 

Our approach differs from the compound calculus dialect of the 
pattern calculus, for example, because the latter separates access to 
the left and right parts of the pair without a scope to existentially 
quantify the introduced type variable. In FCP q , x and y only occur 
in the in branch of an is-pair expression which is the ideal site at 
which to introduce a fresh existentially quantified type variable to 
constrain the use of these unpaired variables. 

The T-IsPair rule in Figure 9 shows the type relation that reflects 
this understanding of the is-pair expression. We have proven that 


Case T-IsPair If the final deduction rule is T-IsPair and A b 
(ispair e bind (a:, y) in / else g) : t we have 

A b e: r e (1) 

This, with the induction hypothesis gives either e is a value or 
e —> e'. Furthermore, if e is a value, it is either a constructed 
value (K(vi,..., v n )) or it is not. These possibilities give rise 
to the following cases: 

e is not a value By e —» e' and E-IsPair 1 we have ispair 
e bind ( x, y) in / else g — > ispair e' bind ( x , y) in 
/ else g. 


31 




e is a constructed value Say e is the constructed value 
K(e i,..., e„). By E-IsPair 2 we have 
ispair K{yi,...,v„) bind (x,y) in / else g —> 
[X/K( V1 ,..., u„-i), y/v n ]f. 

e is a value, but not a constructed value Say e is the non- 
constructed value v e . By E-IsPair 3 we have ispair 
v e bind (a;, y) in / else g — > g. 

Case T-ConI If A \- K(e 1,..., e „): r, and all e, are values, 
then K(ei,...,e„) is a value. If A h K(e r, 

and one of e, is not evaluated (say e,). then by T-Con, A \- 
ej\ Tj. Hence by the induction hypothesis e, —> e' and 
K(ei ,..., e,-,..., e„) —► JtT(ei,..., e'-,..., e„) 

□ 


(id) 

(var) 


Q r mod V 1 

JSt \°*YUTVb 

r 1 ~ 1 a mod V J 

r ~ i/ mod V 

_ Ut' ~ Ui/ mod U _ 

(t -* r') U ~ (v v') mod V 


Figure 10. Rules for unification. This version of unification re¬ 
duces to normal unification when the set V is empty. 


Theorem 3.2 (Preservation). If A h e: r and e —> e' then 
Ah e' :r 

Proof. The proof proceeds by induction on the depth of the evalu¬ 
ation tree, with one case for each possible final reduction e —> e'. 
Here we only give the cases related to the spine view of data. 

Case E-ISPAIR2 If ispair K(vi ,..., v rn ) bind (x, y) in / else 

g —> [x/K(vi,... ,v m -i),y/vm]f and A h ispair 
K(vi ,..., Vm) bind (x,y) in / else g: t, by T-IsPair 
we have 

Ah K(vi,...,Vm): r (2) 

A x , v ,x: a -A r,y: a h v: r (3) 

where a is unique. From (2) and T-CON2 we have 

V* € 1... m.(A h Vi: r'i) (4) 

By (4) and T-Con 2 again (this time in the opposite direction) 

A\-K(vl,...,v m -{)-.T m ^T (5) 

Note also that (4) includes the fact that 4h» m : r.'„. This with 
(5), (3) and Lemma 3.1 gives A h [x/K(vi,... ,Vm-i),y/vm)]f 
: r as required. 

Case E-ISPAIR3 If ispair v bind (x, y) in / else g —> g and 
A h ispair v bind (x, y) in / else g : r, by T-IsPair we 
have Ah g: r as required. 

□ 

One particularly important lemma used in the proof is 
Lemma3.1 (ExistentialInstantiation). IfA X:V ,x\ a -A t', y: ah 
e: r and a ^ TV{A, r, r',p') and A h v ': p' t' and 
A h v ": p' then A h [x/v', y/v"]e: t for any p'. 

Proof. The proof is by induction on the length of the type deduction 
for A XtV , x : a t' ,y: o. h v : r, with one case for each possible 
final deduction rule. □ 

4. Type Inference for the Spine View 

In this section we define a type inference algorithm for FCP q . This 
type inference algorithm is built from the type relation in Figure 9 
and we have developed both a proof of correctness and a working 
implementation in the DGEN compiler. 2 As noted in Section 2, 
full type inference is generally not possible for higher ranks and 
polymorphic recursion. FCP q solves this problem by insisting that 


2 DGEN implements a variant of the type inference algorithm we present 
here. Specifically a type annotation version of FCP is used as the basis of 
type inference in DGEN. Crucially, the rules relating to the spine view are 
not impacted by this adjustment. 


these types be witnessed by a constructor. Figure 11 describes the 
type inference algorithm for FCP q . The type inference algorithm, 
denoted TA h t : o mod V, takes as input a type r, a set of fixed- 
for-unification variables V, and an environment A. It calculates a 
set of substitutions T which are applied to the environment and a 
type a. UT denotes a substitution U applied to a substitution T. 
Unification, Figure 4, t ~ p mod V steps take in two types r 
and p and calculate a substitution U that unifies those types. To 
enforce fixed-for-unification variables, FCP q uses the unification 
algorithm of FCP which keeps track of those variables (as V ) in 
the environment that must be handled in this way and adds these 
variables to the occurs check. 

The type rules for variables, abstractions and applications are 
taken directly from FCP. The rule for pattern matching lambdas is 
derived from the equivalent FCP rule. The variable, abstraction, ap¬ 
plication and let expression rules are equivalent to the correspond¬ 
ing Hindley-Milner rules, but the set of fixed-for-unification vari¬ 
ables is passed around. The constructor rule enforces the require¬ 
ment that a constructor tagging values has the correct type for those 
values by unifying them, it also enforces the quantification of the 
variable a by ensuring it is unique. 

The recursive let rule is specific to FCP q but does not require 
anything new because polymorphic recursion is dealt with by wit¬ 
nessing constructors. Inference for normal function application can 
deal with putting spine data back together but we describe an en¬ 
tirely novel algorithm for type inference of is-pair expressions. We 
can provide type inference rules for the existentially quantified type 
variable in an is-pair expression without having to provide a com¬ 
plete implementation of existential typing. Specifically, all we need 
to do is ensure that any new existential variables introduced by an 
is-pair expression are treated like constants for the purposes of uni¬ 
fication in the body of the is-pair expression, which can already be 
done with our unification algorithm by adding that variable to the 
set of fixed-for-unification variables. 

The type inference rule for is-pair expressions first calculates 
the type of the expression being tested (c). This inferred type is 
used as the basis for the type of the conditional in the first branch. 
We store two types for x and y in the environment when inferring 
the type for t. The first (ft —> r c ) is the type for x anywhere in the 
body of the branch and the second (ft) is the type for y. When we 
are inferring the type of the t branch, we ensure that ft is treated 
as an existential type variable by adding it to the set of fixed-for- 
unification variables. In this environment we calculate the type for 
the first branch. This is all the hard work done and we use standard 
techniques to get the type for the second branch and to unify that 
type with the type we got for the first branch. 

In this way. Figure 11 describes a type inference algorithm for 
a language that supports both the fully applied constructor view 
and the spine view of data. This type inference algorithm has been 
implemented in the generic programming language DGEN [18], a 
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I-Var 


I-Abs- 


{x\ Va.r) € A /? new 
A\- w x\ [P/a\r mod V 
TA \- w e : t mod V T'TA \- w f : r mod V 
T't ~ (r' -¥ a) mod V a new 

UT'TA \- w ef:Ua mod V 
(T K = Vj.(Vai.ri) -#••••->• (Va„.T„) : 

Vi € {1,, n}.(TiA \~ w d : pi mod V 


I-Con— 


T{A x ,x: a) \~ w e: r mod V a new 
TA \~ w Ax.e: Ta — ► r mod V 

T{A X ,x: a) \~ w e: r mod V Ta ~ r mod 1/ a new 
cr = Gen(UTA, Ur) T'{UTA x ,x : er) F w /: p mod V 

T'UTA h w letrec x = ein f \ p mod V 
ai,..., a„, 7 new (/ — (A ... U„ T = Ti...T n 
H ~ t, mod {V U {a;}) ai £ TV(U t T t A, U,tf.}) 


&k — V~ . (Va i .Tj) - 


UTA K (ei,..., e„): Ur' mod V 

■ ■ ■ (Vai.ri) —>• r' ai,... a n ,-y new T(A{a;i, Xi: n,... ,x n : r„) \~ w e: r e 

TA \{K Xl ■■■ x„).e: Tr ->■ r e mod V 
■ c mod V T’T{A, a; : /? —r c , j/: /3) \~ w t n mod (V U {/?}) T"A \- w e: r e mod V 

i r e mod V /3 new /3 ^ TV(UTA,Un) 

UT"T'TA \~ w ispair cbind (a;, y) in t else e: Z7r e mod V 

Figure 11. Type Inference for FCP q 


language that supports both the spine view of data and a method of 
creating type indexed functions. The distribution of DGEN contains 
many example generic expressions and supports experimenting 
with the spine view we have described here. 

4.1 Type Inference Correctness 

We have proven the soundness and completeness of the type infer¬ 
ence algorithm with respect to the type relation of Figure 9. 

Theorem 4.1. IfTA \- w e: r mod V then {TA) lb#: r 

Proof. The proof proceeds by induction on the length of the type 
inference for TA h w e: r mod V, with one case for each possible 
final step. The proof uses the same structure as the syntactic proof 
for type relation soundness and is routine. The fact that fH is fixed- 
for-unification in I-IsPair and T-IsPair is important in the proof 
because it ensures the unification of r e ant r t will generate an 
appropriate unification U. 

The main source of potential difficulty in this proof (and the 
next) is the existentially quantified variable and it’s enforcement by 
adding fixed-for-unification variables to the unification algorithm. 
However, FCP itself allows existentially quantified variables and 
uses the fixed-for-unification variables to perform type inference 
for these. Thus everything required for the FCP q proof is already 
present in the equivalent proof for FCP. □ 

Theorem 4.2. If(TA) \- e: rthenTA h w e: r mod V where t' >- 


Proof. The proof proceeds by induction on the length of the type 
inference for A h e: r, with one case for each possible final 
deduction rule. As above, the proof is a routine application of proof 
by structural induction. Side conditions must be used to ensure the 
required properties of unification and substitution application. □ 

We have also implemented a variant of type inference algorithm 
in the DGEN compiler. DGEN includes; the spine view of data en¬ 
coded via ispair, the fully applied constructor view via case state¬ 
ments and function definitions, type-indexed functions via an ex¬ 
tension primitive, and a run-time which supports both the spine 
view and the fully applied constructor view. Because the imple¬ 
mentation in dgen of the type inference algorithm predates this 
presentation of the work, it is not identical to the one we present 


here, but it is the same in all matters of substance. The interested 
reader can download dgen and run the type inference algorithm 
over large code examples. A number of examples are included 
in the DGEN distribution including: generic traversal varying by 
traversal order and exit condition, generic show, generic equality, 
generic zip, generic map, and generic query [18]. 

4.2 FCP Correction 

The algorithm in Figure 11 includes a correction to the original 
FCP algorithm which we now describe. 

In the original formulation of FCP [10] the side condition on 
the I-Con rule (there called make ) is given as a £ TV {UTA). 
However, the algorithm with this side condition calculates the in¬ 
correct type for some terms. In particular, it will calculate the type 
T P' (where 3' is not bound in the type environment) for the term 
K (Ax.x) in the context of the constructor K, ok = Va.(Vf3.f3 —» 
a) —> T a. We have tracked the error to an implicit constraint on 
the type relation which is not explicitly respected in the type infer¬ 
ence algorithm. Adding that a (fi TV{Ur') to the side condition 
of I-Con restores the constraint, correcting the type inference algo- 

5. Related Work 

The Pattern Calculus The spine view is fundamental to the pat¬ 
tern calculus which is a calculus that sets pattern matching as the 
guiding principal of programming. We first saw data considered 
and pulled apart according to the spine view in Jay’s early work [6] 
on the pattern calculus and later work [7] has taken the evaluation 
and typing of the spine view of data to great lengths, resulting in 
the pure pattern calculus [8]. 

Our work takes a slightly different approach to the semantics of 
the spine view than any of the pattern calculus formulations, but is 
largely in line with them. The primary difference is the restriction 
of the partially applied constructor to a variable bound inside an 
is-pair expression. It is this difference that allows us to formulate a 
type inference algorithm for this expression. 

The pattern calculus gives a full account of evaluation using 
the spine view and extends it to allow dynamic patterns, which 
we do not consider here. The pattern calculus also has various 
type systems defined upon it. However, type inference for the 
pattern calculus is not described in any published work and the only 
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documented version is in the source code of the bondi compiler. 
There it uses “aggressive assumptions” [7] to make inference work. 
What we present here is a type inference algorithm for a language 
very closely related to the pattern calculus, but we fully describe it, 
show its derivation, and prove its correctness. 

The Spine View The spine view first came to broad attention 
through the Scrap Your Boilerplate (SYB) series of papers and 
libraries [14—16] although it was not known as such at the time. 
Work by Hinze et. al. [3, 5] exposed the spine view that underlies 
SYB. Our work is broadly compatible with both these systems 
but has a subtly different approach. In both these systems it is 
the evaluation mechanics and fitting the system into Haskell’s type 
system that are of primary concern. By taking the very smallest and 
most fundamental version and exposing it in a simple language we 
have added a specific type system for this view, including a type 
inference algorithm. 

Higher Ranked Type Systems In Section 2.1 we used the equiv¬ 
alence between System F and FCP to write functions that required 
higher ranked types but for which types could still be inferred. We 
have explained that the FCP approach is equivalent to using type 
annotation. There are other modem higher-ranked type systems that 
may be amenable to the same type of manipulation to cope with 
spine types, such as HMF by Leijen [13] and the practical higher 
ranked inference of Peyton Jones et. al. [17], We expect that there is 
nothing in the spine view, or the is-pair expression, that would con¬ 
flict with the type inference in either of these systems. The main 
challenge would be ensuring the existentially quantified type vari¬ 
able introduced by the is-pair expression is treated as such. 

6. Future Work 

dgen describes a full implementation of generics and this work 
was a result of taking one part of dgen’s implementation and 
formalising it. We plan to repeat this process for other aspects, 
in particular the run-time representation and the system of type 
indexed functions. 

Hinze and Loh [3] have shown how the spine view can be 
extended to support generic producers, it is our intention to apply 
their work to FCP q . 

Presently we have an implementation of the ideas in this pa¬ 
per (in dgen) and separately we have proofs of various properties 
of that system. While the simplicity of FCP q makes this approach 
tractable, it will quickly become unsupportable if the system is ex¬ 
tended. Furthermore, it is currently not possible to prove properties 
of dgen’s implementation, nor is it possible to execute substantial 
examples in FCP q . A machine assisted proof of the properties in 
this paper with a system like Coq would allow the implementation 
and the theoretical basis to be unified and make future extensions 
of the system easier. 

7. Conclusion 

We have described a small extension of the lambda calculus that 
supports the spine view of data. We have given a type checking 
relation and described a type inference algorithm for this language 
that requires no extra type annotations to support the spine view. In 
this way we have created the smallest expression to date of the spine 
view of data and showed how it can be included in any functional 
programming language based on a Hindley-Milner type system. 
The type system has been proven sound and has been implemented 
in a compiler for a general purpose language. 
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